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Abstract. We consider the following variant of Huffman coding in which the costs of the letters, 
rather than the probabilities of the words, are non-uniform: "Given an alphabet of r letters of non- 
uniform length, find a minimum-average-length prefix-free set of n codewords over the alphabet;" 
equivalently, "Find an optimal r-ary search tree with n leaves, where each leaf is accessed with 
equal probability but the cost to descend from a parent to its ith child depends on i." We show 
new structural properties of such codes, leading to an 0{n log 2 r)-time algorithm for finding them. 
This new algorithm is simpler and faster than the best previously known 0(nr min{log n, r})-time 
algorithm due to Perl, Garey, and Even 

Key words. Algorithms, Huffman Codes, Prefix Codes, Trees. 

AMS subject classification. Analysis of Algorithms. 

1. Introduction. The well-known Huffman coding problem M is the following: 
given a sequence of access probabilities (pi,p2, ■■■,Pn), construct a binary prefix code 
(wxi W2, w n ) minimizing the expected length '^2 i Pi ■ length(wj). A binary prefix 
code is a set of binary strings, none of which is a prefix of another. 

A natural generalization of the problem is to allow the words of the code to be 
strings over an arbitrary alphabet of r > 2 letters and to allow each letter to have an 
arbitrary non- negative length. The length of a codeword is then the sum of the lengths 
of its letters. For instance, the "dots and dashes" of Morse code are a vari able -length 



alphabet with length corresponding to transmission time. (See Figure 2.1.) This 
generalization of Huffman coding to a variable-length alphabet has been considered 
by many authors, including Altenkamp and Mehlhorn jl]], and Karp 0. Apparently 
no polynomial-time algorithm for it is known, nor is it known to be NP-hard. 

A prefix code in which the codewords {wi, W2, ■■, w n ) are in alphabetical order is 
called alphabetic Q. In this case the underlying tree represents an r-ary search tree. 
The length of the ith letter corresponds to the time required to descend from a node 
into its ith subtree. This time is often a function of i in search-tree algorithms, for 
instance, when the subtree to descend into is chosen by sequential search. An optimal 
alphabetic code thus corresponds to a minimum expected-cost search tree. 

In this paper we consider the special case in which the codewords occur with 
equal probability, i.e., each pi equals 1/n. With this restriction, the alphabetic and 
non- alphabetic problems are equivalent. The problem may be viewed as a variant 
of Huffman coding in which the lengths of the letters, rather than the codeword 
probabilities, are non-uniform. Alternatively, it may viewed as the problem of finding 
an optimal r-ary search tree, where the search queries are uniformly distributed but 
the time to descend from a parent to its ith child depends on i. For the complexity 
results stated in this paper, the algorithms return a tree representing an optimal code. 

In 1989, Kapoor and Reingold [[I] described a simple 0(n)-time algorithm for the 
binary case r = 2. In 1975, Perl, Garey, and Even (tJ gave an 0(rnmin{r, logn})-time 
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Fig. 2.1. Two trees for the 6 symbols a,b,c,d,e,f, each occuring with probability 1/6. The tree 
on the left is the optimal tree that uses the alphabet {0, 1}, length(0) = length(l) = 1, while the tree 
on the right is for the alphabet {.,_} with length(.) = 1 and length(S) = 2. The corresponding sets 
of codewords are 

a = 000, 6 = 001, c = 011, d = Oil, e = 10, / = 11 

and 

a = ...., b= , c=.._, e£=._, e = _. , / = 



algorithm. (Although due a typographical error their abstract incorrectly claims an 
0(rn)-time algorithm.) In the same year Cot 0] described an 0(r 2 n)-time algorithm. 
In 1971, Varn |8) gave an algorithm without analyzing its complexity. It appears 
Varn's algorithm requires Q(rn) time. 

In this paper we describe an 0(n log 2 r)-time algorithm based on new insights into 
the structure of optimal trees. In Section 2 we define shallow and proper trees and 
prove that some proper shallow tree is optimal. In Section 3 we develop the algorithm, 
which efficiently constructs all proper shallow trees and returns one representing an 
optimal prefix code. 

2. Shallow Trees. Fix an instance of the problem, given by the respective 
lengths (ci < ci < ■ ■ ■ < c. r ) of the r letters in the alphabet and the number n 
of (equiprobable and prefix-free) codewords required. We assume the standard tree 
representation of prefix codes, as described in the following definition. 

Definition 2.1. The infinite r-ary tree is the infinite, rooted, r-ary tree. Each 
tree edge has a length and a label — an edge going from a node to its ith child has 
length Ci and is labeled with the ith letter in the alphabet. 

A node is a node of the infinite r-ary tree. The finite words over the alphabet 
of r letters correspond to the nodes. The labels along the path from the root to any 
node spell the corresponding word and the length of the path is the length of this 
word. A prefix code corresponds to a set of nodes none of which is a descendant of 



another. (See Figure |2.1| .) 

Definitions 2.2. A tree is any subtree T of the infinite r-ary tree containing the 
root. In any tree, n of the leaves will be identified as terminals; their corresponding 
words form a prefix code. The remaining nodes in the tree are referred to as non- 
terminals . 

Give a node u, the notation childi(u) denotes u's ith child; depth(u) denotes the 
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depth (the length of the corresponding codeword); parent(u) denotes the parent. 

The cost c(T) of such a tree is the sum of the depths of the terminals — also 
called the external weighted path length of the tree. 

A proper tree is a tree in which every non-terminal has at least two children. The 
goal is to find an optimal tree with n terminals. It is easy to see that some optimal 
tree is proper; thus, we restrict our attention to proper trees. 

Our basic tool for understanding the structure of optimal trees is a swapping 
argument. For example, in any proper optimal tree, no non-terminal is deeper than 
any terminal. Otherwise, the terminal and the subtree rooted at the non-terminal 
could be swapped, decreasing the average depth of the terminals. 

We use a swapping argument to prove that an optimal proper tree has the fol- 
lowing form for some to. The non-terminals are the to shallowest (i.e., least-depth) 
nodes of the infinite tree, while the terminals are the n shallowest available children 
of these nodes in the infinite tree. We call such a tree shallow; here is the precise 
definition: 

Definition 2.3. A tree T is shallow provided that 

(i) for any non-terminal u € T and any node w (not necessarily in T) that is 
not a non-terminal, depthiu) < depth(w) and 

(ii) for any terminal u G T and any node w that is not in T but is a child of a 
non-terminal, depth(u) < depth(w). 

Note that a non-terminal of an (improper) shallow tree might have no children 
in the tree. This is why we refer to "terminal" and "non-terminal" nodes in place of 
the more common "internal nodes" and "leaves" . 

As a simple example consider the basic binary tree; r = 2, c\ = c% = 1. A 
proper binary tree T will be shallow if and only if there is some depth I such that 
(a) every node u in the infinite tree with depth(w) < Z is a non-terminal in T and (b) 
all terminals of T are on levels I and I + 1. Conditions (a) and (b) are necessary and 
sufficient conditions for T to have minimum external path length among all binary 
trees with the same number of leaves, see e.g., |], §5.3.1]. So, a binary tree has 
minimum external path length for its number of leaves if and only if it is shallow. For 
example, the binary tree on the left of Figure 2.1 has minimum external path length 
among all trees with 6 leaves because it fulfills conditions (a) and (b) with I = 2. As 
we will see later, though, for most values of r and a shallowness alone does not imply 
optimality. However, if a shallow tree has the right number of non-terminals, then it 
is optimal: 

Lemma 2.4. Let to* be the minimum number of non-terminals in any optimal 
tree. Then any shallow tree with to* non-terminals is optimal and proper. 

Proof. Fix a shallow tree T with to* non-terminals. We will show the existence 
of an optimal tree with the same non-terminals as T. Since T is shallow, by property 
(ii), this will imply T is optimal. By the choice of to*, T is also proper (otherwise 
there would be an optimal proper tree with fewer non-terminals). 

It remains to show the existence of an optimal tree with the same non-terminals 
as T. Let T* be an optimal (and therefore proper) tree with to* non-terminals. Let 
N and N* be the sets of non-terminals of T and T* , respectively. UN — N* we are 
done. Otherwise, let u be a minimum-depth node in N — N* , so that u's parent is in 
N*. Let u* be a node in N* —N. Note that, since T is shallow, depth(w*) > depth(u), 
but that, in J 1 *, u* is a non-terminal (with at least two terminal descendants) while 
u is either a terminal or not present. 

In T* , swap the subtrees rooted at u and u* . Specifically, make u a non-terminal 
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Fig. 2,2, The top of a labeled infinite tree with r = 3, c\ = 2, C2 = 2, and eg = 5. 



and, for each descendant v* of u* , delete it and add the corresponding descendant v 
of u. If v* was a terminal, make v a terminal, otherwise make v a non-terminal. If u 
was a terminal, make u* a terminal, otherwise delete u* . Call the resulting tree T'. 

From depth(w*) > dcpth(w) it follows that c(T') < c(T*). Thus, V is also 
optimal. Note that T' shares one more non-terminal with T than does T*. Thus, 
repeated swapping produces an optimal tree with the same non-terminals as T. □ 

Note that m* > (n — l)/(r — 1), since each node has degree at most r. 

Corollary 2.5. Letm min = \(n- l)/(r- 1)] . Let (T mmin ,T mmin+1 ,T romin+2 , ...} 
6e any sequence of shallow trees such that for each m, T m has m non-terminals. Then 
one of the T m is proper and optimal. 

The algorithm generates a sequence of shallow trees as above and returns the one 
which has minimum cost. The lemma guarantees that this tree will be optimal. The 
rest of the paper is devoted to examining the properties of shallow trees which enable 
the enumeration of the proper shallow trees in 0(nlog 2 r) time. 



2.1. Defining the Trees. 
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Ordering the nodes. . Label the nodes of the infinite tree as 1,2,3,..., in order 
of increasing depth. Break ties arbitrarily, except that if two nodes u and w are of 
equal depth, and both are ith children of their respective parents, and parent(u) < 
parent(w), then let u < w (this is needed for Lemma |3.2|) . For the sake of notation, 
identify each node with its label, so that 1 is the root, 2 is a minimum-depth child of 



the root, etc. Figure 2.2 illustrates the top section of such a labeling for r = 3, c\ = 2, 
C2 = 2, and C3 = 5. These values of r and Cj are the ones we use in all later examples. 

Definition 2.6. For each to > m m i n define T m to be the tree whose non- 
terminals are {l,...,m} and whose terminals are the minimum n nodes among the 
children of {1, to} in {to + 1, to + 2, ...}. Thus, T m is the "shallowest" tree with 
m non-terminals with respect to the ordering of the nodes. Since the ordering of the 
nodes respects depth, each T m is shallow. Figure |2.3| presents T 5 , T 6 , T7, and T 8 for 



n = 10 using the labeling of Figure 2.2 



2.2. Relation of Successive Trees. Next we turn our attention to the relation 
of T m+ i to T m . 

Lemma 2.7. For m > m m i n , the new non-terminal (node to + 1) in T m+ i is the 
minimum terminal ofT m . 

Proof. The parent of to + 1 is in {1, ...,to}, so m + 1 is the minimum child of 
{1, to} in {to + 1, to + 2, ...}. The result follows from the definition of T m . □ 

Lemma 2.8. For to > m min , provided the new non-terminal (node m + 1) in 
T m+ i has at least one child, each terminal of T m+ i is either a child of m + 1 or a 
terminal of T m . 

Proof. Let node m + 1 have d children in T m +\. Let C denote the set of children 
of nodes {1,...,to} in {to + 1,to + 2,...}. The terminals of tree T m +i consist of 
the minimum d children of node to + 1 together with the minimum n — d nodes in 
C — {to + 1}. These n—d nodes, together with node m + 1 (the minimum node in C), 
are the n — d + 1 minimum nodes in C. If d > 1, then by the definition of T m , each 
such node is a terminal in XL . □ 



The main significance of Lemmas 2.7 and 2. J is that they will allow an efficient 



construction of T m +i- Moreover, they imply that, if T m is not proper, neither is any 
subsequent tree. 

Lemma 2.9. One of the trees (T mmin , T mmin+ i, T„ lmax ) is optimal and proper, 
where m max = min{TO : T m+ i is improper}. 



Proof. By Lemma 2.8, if T m is improper, then so is T m+ i — either node m+1 has 
no children in T m+ i or the non-terminal in T m that had less than two children also 
has less than two children in T m+ \. Hence, for each to > TO max , tree T m is improper. 



Thus Corollary 2.5 implies that one of the trees (T mmin , T mmin+ i, ...,T TOmax ) is proper 
and optimal. □ 

For n = 10, TO m i n = [tttI = 5 and (as shown in Figure [2~3| ) T$ is improper. 
The lemma then implies that one of T5, Tg > or T7 must have minimum external path 
length. Calculation shows that T 6 with c(T 6 ) = 59 is the optimal one. 

3. Computing the Trees. The algorithm uses the following two operations to 
compute the trees. 

To Sprout a tree is to make its minimum terminal a non-terminal and to add the 
minimum child of this non-terminal as a terminal. 

To Level a tree is to add c children of the maximum non-terminal to the tree as 
terminals and to remove the c largest terminals in the tree. The c children 
are the minimum c children not yet in the tree, where c is maximum such 
that all children added are less than all terminals deleted. 
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Fig. 2.3. T/ie irees T 5 , T 6 , T 7 , and T 8 /or r = 3, ci = 2, c 2 = 2, c 3 = 5 and n = 10. The 
node numbering is that of the previous figure. Calculating the external path lengths we find that 
c(T 5 ) = 60, c(T 6 ) = 59, c(T 7 ) = 60, and c(T 8 ) = 62. 



The algorithm computes the initial tree T mmin then repeatedly Sprouts and Levels 



to obtain successive trees until the tree so obtained is not proper. Lemmas 2.7 and 



imply that, as long as node m + 1 has at least one child in T m+ i (it will if 



T m +i is proper), SPROUTing and LEVELing T m yields T m+i . Figure 3.1 illustrates 
this operation. 

Observation 3.1. Let m = r7i max . If node m + 1 has at least one child in T m+ \ 
then SPROUTmg and Level T m yields tree T m+ \. If node m + 1 has no children 
in T m +i, then the maximum terminal in T m is less than the minimum child of node 
m + 1 and SPROUTmg and LEVELing T m yields a tree in which non-terminal m + 1 
has one child. Hence, the algorithm always correctly identifies T" mmax and terminates 
correctly, having considered all relevant trees. 
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T 6 = Level(Sprout(T 5 )) 




Fig. 3.1. SPROUTing and LEVELmg T5 yields T§. 



To Sprout requires identification and conversion of the minimum terminal of the 
current tree, whereas to Level requires identification and replacement of (no more 
than r) maximum terminals by children of the new non-terminal. One could identify 
the maximum and minimum terminals in O(logn) time by storing all terminals in 
two standard priority queues (one to detect the minimum, the other to detect the 
maximum). At most r terminals would be replaced in computing each tree and, 
because m max < n — 1, only 0(n) trees would be computed. This approach yields an 
0(rn log n)-time algorithm. 

By a more careful use of the structure of the trees, we improve this in two ways. 
First, we give an amortized analysis showing that in total, only O(nlogr), rather 
than 0(rn), terminals are replaced. Second, we show how to reduce the number of 
non-terminals in each priority queue to at most r. This yields an 0(n log 2 r)-time 
algorithm. 

Both improvements follow from the tie-breaking condition on the ordering of the 
nodes, which guarantees that T m must have the following structure. 

Lemma 3.2. In any T m , if u and w are non-terminals with u < w, and the 
ith child of w is in the tree, then so is the ith child of u. If the ith child of w is a 
non-terminal, then so is the ith child of u. 
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Proof. Straightforward from the definition of T m and the condition on breaking 



ties in ordering the nodes (in £2.1). □ 

Corollary 3.3. Node to has a minimum number of children among all non- 
terminals in T m . 

3.1. Only 0(n log r) Replacements Total. The number of terminals replaced 
while obtaining T m from T m _i is at most the number of children of non-terminal to 
in T m . Although this might be r for many m, the sum of the numbers of children is 
0(n log r): 

Lemma 3.4. Let d m be the number of children of non-terminal to in tree T m . 
Then is O(ralogr). 

Proof. By Corollary |3.3| , within T rn , node to has the fewest children. The total 
number of children of the m non-terminals is m + n — 1. Thus, d m is at most the 
average (to + n — l)/m = 1 + (n — l)(l/m). 



m 



^ d m < Ka - m min + 1) + (n - 1) ^2 V 

m— m m i n m— m m i n 

= 0(m max - m min + nlog(TO max /TO min )). 

The result follows from m min = [731! an d TO max < n — 1. 
□ 

3.2. Limiting the Relevant Terminals. To reduce the number of terminals 
that must be considered in finding the minimum and maximum terminals, we partition 
the terminals into r groups. The ith group consists of the terminals that are ith 
children (i = 1, r). 

Lemma 3.5. In any T m , for any i, the set of non-terminals whose ith children 
are terminals is of the form {m,Ui + 1, Wi} for some Ui and Wi. The minimum 
among terminals that are ith children is childiiui) (the ith child ofui). The maximum 
among these terminals is childi(wi). 



Proof. A straightforward consequence of Lemma 3.2. □ 



Figure 2.3 presents Ui and Wi for the trees T5, Tq, T7, and Ts when n = 10. 

This lemma implies that the minimum terminal in T m is the minimum among 
{chikL (iti) : i — l,...,r}. Our algorithm finds the minimum terminal in T by 
maintaining these r particular children (rather than all n terminals) in a priority 
queue. This reduces the cost of finding the minimum from O(logn) to O(logr). Sim- 
ilarly the algorithm finds the maximum terminal in 0(log r) time by maintaining 
{chikL(u>i) : i = 1, . . . , r} in an additional priority queue. 

Observation 3.6. |^] As an aside, one can prove using Lemma \3.{\ that, for 
any m such that m mnl < to < TO max , c(T m+ i) — c(T m ) > c(T m ) — c(T' m _i). That is, 
the sequence of tree costs is unimodal. To prove this, consider building T m+ i from 
T m . SPROUTmg increases the cost by c\; LiEVEhing decreases the cost with each swap. 
For each swap in building T m+1 from T m , one can show there was a corresponding 
swap in building T m from T m _i and that the decrease in cost (from T m to T m+ i) 
due to the former is bounded by the decrease in cost (from T m _i to T m ) due to the 
latter. Thus, in practice the algorithm could be modified to stop and return T m _i 
when c(T m ) > c{T m _x). 



1 This observation is due to R. Fleischer. 
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3.3. The Algorithm in Detail. The full algorithm has two distinct phases. 
The first phase constructs the base tree T„ lmin . The second phase starts with T mmin 
and, by SPROUTing and LEVELing, iteratively constructs the sequence of shallow trees 

and returns one which has smallest external path length. T mmax is the last proper tree 
in the sequence, i.e., T mmax+ i is improper. Lemma |2.9| guarantees that the algorithm 
returns an optimal tree. We now describe how to implement the first part of the 
algorithm in O(nlogr) time and the second in 0(n log 2 r) time; the full algorithm 
therefore runs in 0(n log 2 r) time. 



The skeleton of the final algorithm is shown in Fig. |3.2| . Procedure CREATE-T mmin 
creates tree T mmin , the variable C contains the external path length of current tree 
T m and mDeg contains the number of children of node m in tree T m . As presented, 
the algorithm computes only the cost of an optimal tree. It can easily be modified 
to compute the actual tree. Note that to check that the current tree T m is proper, 



by Observation 3.1 and Corollary 3.3, it suffices to check that non-terminal m has at 



least two children. 

COMPUTE-TREES((ci, C2, c r ),n) 

1. CREATE-T mmin 

2. WHILE (mDeg > 2) DO 

— Compute Tm+i from Tm 

3. Sprout(T) 

4. Level(T) 

5. C min <- min{C, C min } 

6. RETURN C min 



Fig. 3.2. Algorithm to find an optimal variable-length prefix code 



The routines Sprout and Level are shown in Figure 3.3. 

Recall that the nodes of the infinite tree are labeled in order of increasing depth 
with ties broken arbitrarily except for the requirement that if u and v are both of equal 
depth and both are ith children of their respective parents, then u < v iff parent (m) < 
parent(w). Depending upon c%, C2, . . . , (V, there may be many such labelings. The 
algorithm we present breaks ties lexicographically — suppose u and v have the same 
depth and let u = child; (it') and v = child^u'); then u < v iff v! < v' (or u' — v' and 



i < j). Figure |2.2| illustrates this labeling for r = 3, ci — 2, C2 — 2, an d C3 = 5. The 
sequence of shallow trees is fully determined by this labelling. Figure [2.3| illustrates 
the shallow trees with 10 non-terminals for these r and c values. 

The algorithm represents the current tree T m with the following data structures: 
N — The number of terminals. 

m — The number of non-terminals. Also the rank of the maximum non-terminal. 
C — The sum of the depths of the terminals. 
mDeg — The number of children of non-terminal m. 
D[u] — The depth of each non-terminal u. 

u[i] — The rank of the minimum non-terminal (if any) whose ith child is a terminal 
(1 < i < r). 

w[i] — The rank of the maximum non-terminal (if any) whose ith child is a terminal 
(1 < i < r). If no non-terminal has a terminal ith child, then u[i] > w[i]. 
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Sprout(T) 

— Make the minimum terminal a non-terminal — 

1. m < ml 1 : 

2. Let childi(u[i]) be the minimum terminal in low-queue. 

3. D[m] <- D[u[i]] + c,; u[i] <- u[i] + 1; Update- Qs (T, i) 

4. C<-C-D[m]; mDeg ^ 0; 

— Add smallest child as a terminal — 

5. Add-Terminal(T) 

Level (T) 

1. WHILE (mDeg < r and child m j-) e g +1 (m) is less than 

the max. terminal child, (w[i]) in high-queue) DO 

2. Add-Terminal(T) 

— Delete the maximum terminal — 

3. C«-C-(D[w[»]]+Ci) 

4. w[i] <— w[i] - 1; Update-Qs(T, i) 

Add- Terminal (T) 

1. mDeg <— mDeg + 1; C <— C + D[m] + c mDeg ; 

2. w[mDeg] m; Update-Qs(T, mDeg) 

Fig. 3.3. The Operations Sprout and Level. 



low-queue — A priority queue for finding the minimum terminal. 

Contains {childj(u[i]) : u[i] < w[«]}. 
high-queue — A priority queue for finding the maximum terminal. 

Contains {chikL(w[i]) : u[i] < w [i|}. 
For an example refer back to Figure p.3\ Tree Tq has 

N =10, C = 59, mDeg = 2, 

£>[1]=0, D[2]=2, D[3] = 3, D[A] = 4, D[S\=4, D[6}=4, 

u[l] = 4, u[2] = 3, u[3] = 1, w[l] = 6, w[2] = 6, w[3] = 3 
low-queue = {childi(4), child 2 (3), child3(l)}, 

high-queue = {childi(6), child2(6), child3(3)}. 

The priority queues are maintained as follows. In general, a terminal in T m can 
have rank (label) arbitrarily larger than m. The algorithm explicitly maintains the 
ranks and depths of the m non-terminals in the current tree; the algorithm compares 
the ranks of terminals in the priority queues via the ranks and depths of their (non- 
terminal) parents. When u[i] or w[i) changes to reflect a new current tree, the queues 
are updated by the following routine: 
Update- Qs(T,i) 

1. IF (u[i\ < w[i]) THEN 

2. Update child,- (u[i]) in low-queue and chikL(w[z]) in high-queue 
to maintain the queues' invariants. 

3. ELSE Delete both nodes from their respective queues. 

Line 2 replaces the old child, (u[z]) in low-queue (child* (w[i]) in high-queue) by 
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the new one when u[i] (w[z]) changes. Line 3 will only be executed if childj(u[i]) > 
childj(w[i]), which will only happen if the tree no longer contains any ith child as 



a terminal. Note that Lemmas EJi and 3.2 imply that if, for some i and T m , no 



non-terminal has an ith child in T m , then no non-terminal has an ith child in T m+ i. 

Construction of the First Tree.. Tree T mmin has a simple structure. Its non- 
terminals are the nodes (1,2,..., m m i n ). Its terminals are the n shallowest children of 
nodes (1,2, . . . ,m min ). 

To construct T mtaln we assume that n > r, otherwise T mxllin is simply the root and 
its first n children. For 1 < to < m m ; n , define T m to be the tree with non-terminals 
{1, to} and all of the (r — l)m + 1 children of {1, to} as terminals. The proof of 
Lemma ^7] generalizes easily to these trees; node to + 1 is the minimum terminal of 
T 

-L II 



L m • 



CREATE-T mmm (T) 

— Create T\ — 

1. ™ mi n = r^l ; - o; c = E^i {r,n} 

2. CREATE low-queue; CREATE high-queue; 

3. FOR i = 1 to mm{r, n} DO 

4. u[z] <— w[i] <— 1; Update-Qs(T, i); 



—Create (T 2 , T 3 , . . . , T mmin _i) - 

5. FOR m = 2 to (TO min - 1) DO 

6. Let childj(u[i]) be the minimum terminal in low-queue. 

7. D[m] <- D[u[i]] + c l5 u[i] <- u[i] + 1; Update- Qs (T, i); 

8. FOR j = 1 to r DO 

9. w[j] «- m; Update-Qs(T, j); 

10. C^C-D[ m ]+^ =1 (D[m]+ Ci ); 

— Create Tm min — 

11. m = TO min ; A = n - (r - l)(m min - 1); 

12. Let childj(u[z]) be the minimum terminal in low-queue. 

13. D[m] <- D[u[i]] + c t ; u[i] <- u[i] + 1; Update-Qs(T, i) 

14. FOR j = 1 to A DO 

15. w[j] <— m; Update-Qs(T, j); 

16. C^C-D[m]+^f =1 (D[m] + Cj ); 

17. mDeg = A; 

18. Level (T); 



Fig. 3.4. Operation Create-T, 



The tree Xi is easy to construct. It is the tree with 1 root and r children. In- 
ductively construct the tree T m from the tree T m _i, to < TO m j n as follows: find the 
minimum terminal in T m by taking the minimum terminal out of low-queue. Label 
this node to, make it a non-terminal, and add all of its children to T m as terminals. 
The details are shown in Fig. |3.4| . 

Finally, construct T minia from X! mmin _i by making the lowest terminal of T mmin _i 
into node TO m j n . Add the n — (r — l)(m m i n — 1) minimum children of node TO m j n as 
terminals bringing the total number of terminals in the current tree to n. Level the 
resulting tree. 

Since only 0(n/r) trees are constructed while computing T„ lmin and each tree 
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can be constructed from the previous tree in 0(r log r) time, the time required to 
compute T mmin is 0(n\ogr). (If desired, the time for each tree T m with m < ro m i n can 
be reduced to O(logr), because maximum terminals are not replaced in constructing 
such a tree.) 

Construction of the Remaining Trees.. The algorithm constructs the sequence of 
trees 

(-^m m i n j ^m m i n +l ) ^m m i :1 +2 ) •••) -^m max ) 

as described previously. Tree T m is found by SPROUTing and then LEVELing its 
predecessor T m _i. The cost is 0(d m log r) time, where d m is the number of children 



of the new non-terminal m in T m . By Lemma 3.4 this part of the algorithm runs in 
((X m dm) l°g r ) = 0{n\og 2 r) time. 
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